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One  of  the  most  fundamental  problems  in  vision  is  segmentauon;  the  way  in  which  parts  of  an 
image  are  perceived  as  a  meaningful  whole. 

Recent  work  has  shown  how  to  calculate  images  of  physical  parameters  from  raw  intensity  data. 
Such  images  are  known  as  intrinsic  unages,  and  examples  are  images  cf  velocity  (cpucal  flow), 
surface  orientation,  occluding  contour,  and  disparity.  The  principal  difficulty  with  intrinsic  images  is 
that  each  by  itself  is  generally  underconstrained;  they  can  only  be  computecl  in  parallel  with  each 
other  and  with  the  use  of  parameters  obtained  through  segmentation. 

While  intrinsic  images  are  not  segmented,  they  are  disunctJy  ea.sier  to  segment  than  the  original 
intensity  image.  If  parts  of  these  images  are  organized  in  some  way.  i.his  organization  can  be 
delected  by  a  general  Hough  transform  technique.  Networks  of  feature  parameters  are  appended  to 
the  intrinsic  image  organization.  Then  the  intrinsic  image  points  are  .mapped  into  Lhese  networks. 
This  mapping  will  be  many-to-one  onto  interesung  parameter  values.  I'his  basic  relauonshio  ;s 
extended  into  a  general  representauon  and  control  technique  with  the  addiuon  of  iJiree  mam  icicas: 
abstracuon  levels;  sequential  search;  and  tight  coupling.  These  ideas  are  a  nucleus  cf  a  'iieory  cf 
low-level  and  intermediate-level  vision.  This  theory  explains  segmenuiuon  in  terms  of  b.ighly  parallel 
cooperative  computauon  among  intrinsic  images  and  a  set  of  parameter  spaces  at  di.Terent  levels  of 
abstracuon. 
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1.  Overview 

One  of  the  most  troublesome  puzzles  in  vision  is  how  parts  of  an  image  are  seen  as  a 
meaningful  whole  or  segment.  'I'his  is  known  as  the  segmentation  problem.  Ihe  ambiguous  use  of 
segment,  which  means  part,  to  denote  a  whole,  arises  from  the  fact  that  a  segment  is  an 
intermediate  component  in  a  descripuon  which  relates  an  object  with  an  image.  From  the  viewpoint 
of  the  object  descripuon,  the  segment  is  a  part.  From  the  viewpoint  of  a  group  of  image  points  with 
common  properties,  tlie  segment  is  a  whole. 

Parts  of  an  image  are  seen  as  a  segment  if  uhe  corresponding  physical  object  has  common 
physical  properties,  or  features.  For  example,  if  a  connected  component  of  the  image  has  a  single 
color,  say  red,  then  it  may  be  seen  as  a  segment  The  patch  of  red  arises  from  the  physical  object’s 
surface  reflectance.  Usually  there  are  not  one  but  several  features  which  have  die  same  spaua) 
registration.  For  example,  an  object  may  be  moving,  red.  and  a  cube.  I’igure  la  shows  this  case. 
Segmentation  is  more  difficult  when  features  are  not  spatially  registered.  Figure  lb  shows  a 
multicolored  cube.  Which  feature  should  be  the  most  compelling,  the  color  or  the  geometric  lines 
indicating  the  cube?  In  tlie  general  case  this  answer  depends  on  the  goals  of  the  perceiver.  Another 
common  problem  occurs  when  an  object  is  occluded  (Figure  Ic);  a  theory  of  low-leve  vision  must 
be  able  to  explain  how  an  object  is  seen  as  a  segment  when  the  features  are  only  parually  registered 
or  incomplete.  Real  image  data  is  also  noisy  and  many  segments  are  only  perceived  owing  to  the 
combination  of  weak  evidence  of  several  features.  The  evidence  may  be  so  weak  that  each  feature,  if 
viewed  in  isolation,  would  be  uninterpretable. 

Figure  1 

We  develop  the  nucleus  of  a  theory  of  low-leve!  and  intermediate-level  vision  which  explains 
the  above  aspects  of  segmentation  in  terms  of  massively- parallel  cooperative  computation  |Rosenfe!d 
et  al..  1976;  Zucker,  1976;  Marr,  1979)  between  two  groups  of  networks.  One  group,  intrinsic  images 
[Harrow  and  Tenenbaum,  1978),  can  be  computed  priniarily  in  terms  of  local  constraints.  The  other, 
termed  a  feature  space,  can  be  computed  primarily  in  terms  of  global  mappings  .Tom  intrinsic 
images  to  feature  space.  I'eaiure  space  itself  may  have  many  different  levels  of  absuacuon.  Intrinsic 
images  and  feature  spaces  are  collectively  called  imranieter  networks  because  they  both  have  a 
common  organization.  I’hat  us,  the  network  is  an  organization  of  ba.s;c  units,  each  represenung 
values  of  a  paructilar  parameter.  The  simple  structure  of  units  sunphries  uhe  conuo!  task  and  also 
makes  the  network  represen'uiuon  easily  extendable.  The  basic  eleme.nts  of  uhe  uheory  are  uhe 
following. 

1)  The  cooperative  computation  of  several  intrinsic  images. 

Recent  work  has  shown  how  to  calculate  intrinsic  images  from  raw  intensity 
data.  Examples  arc  images  of  velocity  (optical  How)  (Horn  ant!  Schunck,  19, SC; 

Ullman,  19'77:  1979],  surface  orienuition  (Morn  and  Sjoberg.  197, S;  Ikeuchi,  19S0), 
occluding  contour  [I’rager,  1980;  Rosenfeld  ct  al..  1976).  and  disparity  [Marr  and 
Poggio,  1976;  Harnard  and  Thompson.  1979).  Intrinsic  ini.ages  can  be  computed 
independently  under  special  conditions,  but  in  general  tliey  are  interdependent. 

Intrinsic  images  are  in  concert  with  die  hypodiesis  uhat  die  visual  system  builds 
many  intermediate  descripuons  from  im.age  daua.  These  desenpuons  represent 
important  parameters  such  as  velocity,  depuh,  sur.^ace  refiectance  explicitly,  since 
in  the  explicit  form  they  are  easier  to  map  into  object  desc.'ipuons. 

2)  The  extracdon  of  useful  parameters  from  intrinsic  images. 

If  parts  of  the  intrinsic  image  are  organized  in  some  way,  uhis  organization 
can  be  detected  by  a  general  Mough  transform  technique  [Duda  and  Hart,  1972; 

Hailard,  1981a;  Render,  1978;  Ohlander  et  al.,  1979),  I'his  is  done  by  describing 
the  organization  m  temis  of  param.eters  and  uhen  mapping  uhe  intrinsic  image 
points  'H'o  parameter  space.  The  transi'crmauon  will  be  many-to-one  onto 
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parameter  values  which  represent  meaningful  units.  A  major  advantage  of  the 
I  lough  transform  is  that  it  is  relatively  insensitive  to  occlusion  and  noise. 

3)  Interactions  involving  several  levels  of  abstraction. 

The  Hough  transform  is  a  way  of  seeing  spaua)  information  as  a  unit. 

However,  if  tlie  unit  has  a  complex  structure  the  mapping  from  space  to  unit  can 
be  unmanageably  complex.  A  way  around  this  is  to  introduce  units  of 
intermediate  levels  of  abstraction  ISabbah.  1981;  Ballard  and  Sabbah,  1981; 

Kender,  1978].  This  reduces  a  complex  transform  to  several  simpler  transforms 
between  units  at  successively  higher  levels  of  absiracuon. 

4)  Focus-of-attention  mechanisms. 

Fisuul  focus-of-attention  can  be  partly  explained  as  the  conjunction  of  two 
mechanisms:  1)  the  use  of  Hough  transforms  to  modify  sensor  input;  and  2)  the 
sequential  applicauon  of  Hough  transforms. 

5)  Coupling  between  intrinsic  images  and  parameters. 

In  general,  intrinsic  images  cannot  be  computed  without  global  parameters. 

At  the  same  time,  these  global  parameters  are  what  we  mean  oy  seeing  parts  of 
the  inuinsic  image  as  a  segment.  In  these  cases  the  intnnsic  image  and 
parameters  are  said  to  be  tightly  coupled:  although  each  cannot  be  computed 
independently,  tliey  can  be  cornpuied  simultaneously  |Hallard.  1981b;  1981c!. 

We  re-emphasize  that  our  interest  is  low-level  vision.  Thus  in  item  (4)  above,  focus  of  attenuon  is 
interpreted  in  a  narrow  sense;  vi.sual  features  which  are  clear  can  help  tlie  recogniuon  of  other 
features  (or  perhaps  direct  eye  movements).  We  do  not  attempt  to  explain  general  plans  and  goals. 

Representations  for  Parameter  Networks 

The  basic  element  of  a  parameter  network  is  a  parameter  node.  A  parameter  node  will 
represent  a  single  parameter  value  and  has  an  associated  confidence.  The  value  is  a  set  of  numerical 
measurements  for  the  node;  the  confidence  is  a  measure  of  their  believability.  For  example,  if  tltere 
is  an  edge  at  (10,10)  with  ortentauon  30®  and  lengtli  5  units,  the  vector  value  of  the  parameter  node 
represenung  the  edge  is  (x,y,0,s)  =  (10.10.30®.5).  The  as.sociated  confidence  is  a  measure  of  'he 
fuzziness  of  this  esumate.  One  way  a  confidence  may  be  increased  is  if  there  are  nearby  edges  of 
the  same  orientauon  w.hich  align.  Thus  in  Figure  2  the  edges  in  (a)  and  (b)  have  the  same  value  but 
we  can  be  more  confident  in  case  (b). 

Figure  2. 

This  paper  assumes  a  very  simple  model;  namely,  collections  of  value  units.  Each  value  unit  is 
connected  to  a  subset  of  oilier  value  units,  and  can  alter  only  those  uniis.  Underlying  physical 
principles  determine  the  appropriate  connection  subsets.  The  confidence  updating  is  done  by  non¬ 
linear  relaxa'jon.  'fhe  overall  structure  of  the  paper  is  slanted  towards  abstractions  of  physical 
principles;  however,  we  al.so  show  how  these  princples  arc  imple.mented  in  he  networks. 

2.  Intrinsic  Iiimges 

An  intrinsic  image  is  an  image  of  some  important  parameter  hat  is  in  registration  with  the 
original  intensity  image  |Barrow  and  Tenenbauin,  1978;  Marr,  1979],  iliat  is,  each  param.eter  :s 
indexed  by  reliiial  coordinates.  For  example,  in  the  velocity  (optical  flow)  image,  one  is  able  to 
compute  at  each  point  in  time  and  for  each  spatial  posiuon  a  local  velocity  vector  v(\,t).  Figure  2 
shows  Horn's  example  for  a  rotating  sphere  |Horn  and  Schunck,  198C|.  Intrinsic  images  may  only  be 
compuuible  over  ceruiin  pans  of  tlie  image,  and  over  hose  pans  the  parameters  are  contmuously 


varying.  While  intrinsic  images  are  not  segmented  into  parts  of  objects,  'hey  are  distinctly  easier  to 
segment  than  tlie  original  in'.ensity  image.  Oilier  examples  of  such  imagc-s  are  surface  orientation, 
occluding  contour,  and  disparity. 

Figure  3. 

Very  recently  there  has  been  rapid  progress  in  finding  algorithms  for  compuung  intrinsic 
images  from  intensity  data.  What  is  remarkable  is  that  each  such  image  type  is  computed  in  he 
same  manner.  Two  constraints,  one  derived  from  physical  principals  and  'he  other  from  a  constraint 
that  tile  rL“siiltant  images  should  be  locally  .smooth,  suffice  to  speci.^'y  a  parallel-iterative  algorithm. 
Table  1  shows  this  commonality  but  is  not  an  e.xhaustjve  Lst  of  approaches.  See  page  2  .'"or 
additional  references. 


Table  1:  Inmnsic  Images 
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While  the  above  algorithms  work  well  on  images  which  are  constrained  to  satisfy  '.he  underlying 
assumptions,  they  may  not  work  in  the  general  case.  Almost  always  here  are  free  parameters  or 
boundary  conditions  which  have  to  be  determined  independently. 

2.1  Multi- llesoliition  Relaxation  Methods 

One  general  notion  of  "boundary  condition"  is  image  resolu'jon.  Previous  methods  for 
compuung  intrinsic  images  .have  used  a  single  image  resolution,  b'.!'.  :n  most  situations  his  is 
unrealistic.  What  is  he  correct  resolution?  At  Ingh  resolution 

*  noise  IS  a  factor 

•  convergence  is  slow 

*  basic  nssumpuons  may  not  hold 

To  see  he  last  point,  imagine  a  surface  wih  a  micro-'.exiure.  At  low  re.soluuon  he  surface 
structure  is  blurred  and  simple  reOcCvance  models  '.-.old,  but  at  high  resoluuon  the  microstructure 
can  render  such  models  '.iseless.  At  low  .a*soIution 

•  noise  IS  less  of  a  factor 

*  convergence  is  fast 

•  basic  assump'aons  may  not  hold 

The  last  point  arises  from  'he  fact  hat  most  inL'in.s:c  i.mages  are  computed  from  constraints  which, 
assume  local  variations  are  smooh.  Widi  increasing  grid  rcsoltiuon^,  'iiese  asstimpuons  are  '.es'- 
likely  to  be  valid. 


Hence  a  conjecture  is  that  there  is  a  range  of  resoluuons  for  which  the  computations  will  be 
valid.  l'’urihermore,  this  range  is  expected  to  be  spatially  variant.  A  too!  .'or  exploj-ing  this  conjecture 
IS  mulugnd  teUuauon  techniques  IHrandi,  '977),  which  have  proven  very  useful  for  solving 
clifferenual  equauons.  This  model,  together  with  reasoning  fium  physical  first  principles,  should 
allow  ilte  determinauon  of  image-depenuent  grid  re.soluuons  for  which  intrinsic  image  computations 
are  valid.  .Vfultignd  lechiiiques  are  of  course  related  to  pyramids  ITanimoto  and  I’avlidis,  1975; 
Hanson  and  Riseiuan,  19781. 

2.2  Cooperative  Computation  of  Multiple  Intrinsic  Images 

Intiinsic  images  are  logically  computed  simultaneously.  In  fact,  they  have  to  be;  otherwise  each 
intrinsic  image  is  underdeiermined  in  the  general  case.  (Only  on  certain  syntheuc  images  is  tlie 
comptiiauon  well-defined.)  ITirthermore,  they  are  highly  interdependent,  paructilarly  at  points  of 
disconunuity  (Harrow  and  Tenenbaum,  1978).  For  example: 

•  intensity  edges  can  be  indicauve  of  depth  discontinuiues.  Thus  the  edge  image 
IS  coupled  to  the  disparity  image; 

•  surface  orienuition  is  also  indicauve  of  depth  disconunuity  and  is  thus  related 
to  the  otlier  two;  and 

•  different  objects  which  are  moving  relauve  to  each  other  produce 
disconunuiues  in  ttie  How  field. 

By  incorporaung  these  couplings  in  the  intrinsic  image  computations,  one  should  find  general 
cases  where  the  computations  will  converge.  A  separate  issue  is  the  behavior  of  the  coupled 
computations  in  the  face  of  conflicting  information. 

2.3  Intrinsic  Images  at  Different  Levels  of  Abstraction 

The  survey  of  inuinsic  images  (Table  1)  excluded  the  fact  that  intrinsic  imagc-s  may  have  fine 
stricture  involving  sevcMl  levels  of  abstraction.  In  fact,  it  seems  likely  that  muluple  abstracuon 
levels  are  necessary  in  many  cases.  For  example,  Zucker  |1980|  uses  two  levels  of  abstracuon  in 
compuung  orienuiuon  inuinsic  images,  one  for  points  of  high  gradients  and  ihc  odier  for  edge 
segments.  'Ihe  computation  of  a  velocity  image  in  3-d  could  involve  three  levels  of  absuacuon: 

•  a  change  aetection  level  where  units  are  used  for  variauons  :n  intensity  over 
space  and  umc  AI/Ax',  Al/Ay,  AI/At  (primes  denote  reunn!  coordinates); 

•  an  optical  flow  level  where  units  correspond  to  retinal  velocities 

(u(x',y').v(x’.y',", 

•  a  i-d  flow  level  where  units  correspond  to  3-d  velocities 

(Vx{x.y.z).Vy(x.y,/.),v^(x.y.z)). 

The  feasibility  of  compuung  the  optical  flow  from  change  measures  has  been  studied  by 
(Barnard  and  rhompson,  1979;  Prager,  1980;  Horn  and  Schunck,  19801.  'I'he  feasibility  of  computing 
3-d  (low  IS  explored  in  (Ballard,  1981cl. 

2.4  Intrinsic  Imiiges  and  Parameter  Nodes 

Two  models  have  been  used  to  compute  intrinsic  images:  1)  the  value  unit  defined  in  Section  1 
(Prager,  19S0;  .Marr  and  I’oggio,  19761;  and  2)  a  variable  unit  [Ikcuc;:.,  1980;  Horn  and  Schunck, 
1980].  In  tire  first  model  there  ;s  a  unit  .'‘or  every  value  of  every  vari;il)ie;  in  e.'‘fcct  the  representation 
has  only  constants.  Constant  value  units  may  have  outputs  which  are  confidences  between  zero  ant; 
one.  In  tire  second  model,  each  unit  represents  a  variable  which  can  take  on  values  (the  standard 
method  is  to  use  an  arras'  for  uhe-se  units).  The  ou'.pu',  :s  Use  value;  there  is  no  e.-.plicil  notion  of 
confidence. 


In  general  ihe  unii/value  represeniauon  is  sufficien'  since  problems  formulaieb  '.o  use  variables 
can  be  uanslbrmed  into  urai/valiie  problems  in  tlie  following  manner.  Suppose  x,  y.  and  z  sausfy  a 
relauon  R(x.y,z)  =  0.  Lei  us  use  a  set  of  values  A  for  x.  !i  for  y,  and  C  for  z.  Wliere  a  €  A,  we 
would  like  C(a)  lo  be  j  u"  there  exist  b  €  B  and  c  €  C  such  that  C(b)  =  1,  C(c)  =  1,  and  R(a,b.c) 
=  0.  To  implement  this  in  a  parameter  network  connect  all  pairs  of  (b.c)  €  BxC  to  a  value  (a)  if 
R(a.b,c)  =  0.  I'hen  starting  with  iniual  confidences,  increment  C(a)  if  there  exist  (b.c)  such  that 
R(a.b.c)  =  0  and  C(b)  -f  C(c)  >  some  threshold.  The  individual  vaiues  b  and  c  may  be  treated 
similarly. 

Note  that  the  updating  function  is  nonlinear,  when  the  underlying  physical  relation  R  is 
nonlinear.  If  the  relauon  R  can  be  linearized  Lben  the  cooperative  computations  can  be  shown  to  be 
equivalent  to  linear  programming  [llinton.  19791.  The  linear  case  has  also  been  analyzed  by 
lllummel  and  Zucker.  19S0|. 

3.  Marametcr  Spaces 

What  does  it  mean  to  perceive  parts  of  an  image  as  a  segment?  In  our  theory,  this  percepuon 
takes  place  if  there  is  a  parameter  space  such  that  each  of  the  parts  can  have  the  same  parameter 
value.  This  general  idea  is  illustrated  by  the  following  examples. 

•  Parts  of  a  color  image  may  be  seen  as  a  segment  if  they  have  the  same  color. 

In  this  case  the  parameter  space  is  a  space  of  colors  and  die  parts  map  into  a 
common  point  represenung  the  common  color. 

•  Parts  of  an  optical  flow  image  may  be  seen  as  a  segment  if  they  are  part  of  a 
ngid  body  that  is  moving.  In  this  case  the  parameter  space  represents  the  rigid 
Ixidy  mouon  parameters  of  translational  and  rotauonal  velocity  and  pares  of 
the  image  map  into  a  common  point  in  that  space. 

•  Parts  of  edge  and  surface  orientation  images  may  be  seen  as  a  segment  if  they 
are  putt  of  'he  same  shape.  This  case  is  more  complicated  as  there  must  exist 
some  internal  representation  of  the  shape.  Given  this  representation,  the 
parameter  space  represents  the  transfomiauon  (scale,  roiauon,  translation) 
from  the  internal  representation  to  he  (viewer-centered)  image  represenuiuon, 

I’arts  of  he  image  which  are  seen  as  he  shape  have  common  values  for  these 
parameters. 

A  general  way  of  describing  this  relationship  between  pans  of  an  image  and  he  associated 
parameters  is  he  Hough  transform  [1  lough.  1962;  Ducla  and  Hart,  1972;  Kimme  et  al,.  1975: 
Shapiro,  197S|.  In  our  low-levcl  vision  heory.  Hough  uansforms  relate  intrinsic  images  and  feature 
spaces  and  feature  spaces  at  different  levels  of  absiracuon.  If  he  intrinsic  image  parameter  is  a 
vector  (x,a(x))  €  A  and  an  element  of  feature  space  i.s  a  vector  I)  C  !I  hen  here  is  usually  a 
physical  consiraini  hat  relates  a(x)  and  I).  i.e.,  some  relauon  fla.l))  such  hat 

f{a,I))  =  0. 

The  space  A  represents  al!  possible  intrinsic  image  values.  A  particular  intrinsic  image  is 
described  by  a  set  of  values  {aj^}  where  aj^  =  a(x.^).  Now  he  set  is  only  consistent  with 

certain  elements  in  die  space  H,  owing  to  he  cons'uaini  imposed  by  '.he  relauon  f  This  physical 
constraint  can  be  exploited  in  he  following  .manner.  For  each  we  can  compute  'he  sc". 

I)j^  =  {lilaj^  and  H^aj^.ii)  <  6;^} 

Define  1 1(6)  as  he  number  of  times  he  value  b  occurs  in  H(l))  is  the  Hough  iransfoim 

from  the  space  a  to  he  space  li  and  is  he  number  of  points  in  intrinsic  image  space  which  are 
consistent  with  the  parameter  value  b.  H(ti)  makes  he  most  sense  when  'he  values  boh  (a(x),x)  and 
!i  are  discrete.  Hence  '.lie  consumt  aixive  is  related  to  he  utiaiv.iziV.O'i  in  'he  space  11.  11  is  also 
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best  normalized  by  defining  C(I))  :=  Il(h)/X|jn(l)).  In  that  ease,  the  value  C(l»)  can  stand  for  the 
confidenee  diat  tlie  segment  with  feature  value  I)  is  present  in  die  image. 

Concerning  the  implemeniauon  of  Hough  transforms  in  networks,  C  H  is  the  subset  of  H 
uiuui  to  which  the  unit  hotild  be  connected  in  the  network.  A  separate  Mmax  unit  is  needed  for 
normalizauon. 

The  Hough  transform  need  not  originate  from  intrinsic  image  space  but  can  be  defined 
between  any  two  spaces  A  and  H  as  long  as  there  is  some  relation  ffii.li)  =  0  for  a  £  A  and  h  £  ft. 
I'o  avoid  describing  the  above  computations  in  detail,  we  can  use  a  shorthand  notauon  for  Hough 
transforms.  Ifach  transform  can  be  de.scribed  as  the  triple 

<a,b,D 

where  the  necessary  computauons  are  mplicit.  .Vote  that  the  order  of  a  and  li  is  important  in  the 
notation;  in  general.  <a.li,D  is  not  equivalent  to  <l).a.D. 

As  a  very  simnle  example  of  a  Hough  transform,  we  describe  how  a  patch  of  red  in  an  image 
may  be  seen  as  a  unit.  For  this  to  happen,  an  association  is  made  between  the  spaually  conuguous 
points  in  die  image  and  the  parucular  value  "red"  in  a  parameter  space  of  colors.  There  are 
essentially  three  dimensions  to  color  space.  Although  r-g-b  is  widely  used  in  computer  applications, 
humans  seem  to  use  an  opponents-process  basis  (r-g.  y-b,  white-black)  [Ilurvich  and  Jameson,  1957J. 
One  (admittedly  overly  simplified)  way  of  transforming  from  (r.g.b)  space  to  opponents  color  space 
IS  to  use  the  following  Linear  transformation: 

rg  1-2  1  r 

yb  =  -1  -1  2  g 

^bw_  J  1  1  _b  (3.1) 

I'hus  the  Hough  transform  is  given  by  <a,l),f>  where 
a  =  (r(x,y),  b(x.y).  g(x,y)) 

b  =  (rg,  yb,  bw) 

and 

f  =  Ta  -  I) 

where  T  is  die  matrix  defined  by  Eq.  3.1. 

For  a  red  spot  on  a  green  background  there  are  two  values  of  color  parameters  which  have 
high  values  for  C(l)):  red  and  green.  The  rest  of  color  Hough  transform  has  low  values.  Figure  “1 
shows  tins  idea,  wluch  lias  been  used  by  [Hanson  and  Iliseman,  1978;  Ohlander  et  al.,  1979), 
applied  to  a  color  image. 

Figure  4:  Hanson  and  lliseman’s 
Segmentauon  m  Color  Space. 

To  show  that  intrinsic  images  and  parameter  spaces  may  be  related  in  more  complicated  ways, 
we  briefly  describe  an  example  of  how  a  .specific  two-dimensional  shape  is  detected  by  specifying  a 
Hough  transformaUon  from  edge  space  (local  Linear  edges  detected  with  a  standard  edge  detector)  to 
a  four-dimensional  parameter  space  consisung  of  local  origin  coordinates,  roiauon  and  scale.  Houh 
the  color-space  example  and  this  one  have  tlie  same  solution  at  an  abstract  level.  In  each  case  Lhere 
IS  a  transformation  from  intrinsic  i.mage  space  to  para.meter  space  ±at  segments  the  image.  In  the 
first  case,  points  in  the  color  image  have  the  same  color  values.  In  the  second  case,  points  in  tlie 
edge  image  have  the  same  shape  parameter  values.  In  fact,  almost  all  segmentauon  problems  can  be 
Characterized  in  tJiis  fashion. 


Table  2  shows  some  other  Hough  transforms. 
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Table  2;  1  lough 'Transforms 

Inirinsic  Image  Hough  Transform 

Opucal  Flow  •  Heading 

•  Rotation  of  3d  rigid  body 

Surface  Orientauon  *  lUuminauon  angle 

•  Shape 

Occluding  Contour  *  Shape 

•  Surface  orientation 

Disparity  *  Segments  of  constant  disparity 

Color  *  Segments  of  constant  color 

The  two-dimensional  shape  example  shows  the  general  feature  of  Hough  uansforms:  if  the 
algorithms  are  completely  parallel,  the  space  required  is  exponenual  in  the  number  of  parameters. 
This  can  lead  to  muiiense  space  requirements.  For  example,  consider  an  eight-parameter  space  of 
100  discrete  values  for  each  parameter.  The  total  number  of  parameter  nodes  required  to  represent 

the  space  is  100^!  Fortunately  this  problem  can  generally  be  alleviated  by  detecting  groups  of 
parameters  setiuentially.  'The  example  of  2-cl  shape  detection  is  reconsidered  in  Section  3.2  to 
illustrate  this  extremely  powerful  decomposition  technique. 

3.1  Detecting  Two-Dimensional  Shapes 

Two-dimensional  shapes  can  be  found  from  a  primal  sketch  |.Marr,  1978]  by  encoding  the  shape 
informauon  in  constraint  tables  |liallard,  1981a|.  Consider  tlie  case  where  an  object  being  sought  has 
no  simple  analyuc  form,  but  has  a  parucular  silhouette.  Suppose  for  the  moment  that  the  object 
appears  in  the  image  witli  known  shape,  orientation,  and  scale.  (If  onentation  and  scale  are 
unknown,  they  can  be  handled  as  addiuonal  parameters,  as  we  will  show,)  Now  pick  a  coordinate 
system  for  tlte  silhouette  and  draw  a  line  to  the  boundary  from  the  coordinate  system  origin.  At  the 
boundary  point  we  can  compute  the  gradient  direction  and  length  and  store  the  reference  point  as  a 
Itincuon  of  this  information.  'Thus  it  is  possible  to  precompute  the  location  of  the  reference  point 
from  boundary  points  given  the  gradient  angle.  'The  basic  strategy  of  the  Hough  technique  for 
shapes  is  to  compute  die  loci  of  points  in  parameter  space  from  an  edge  in  image  space  and 
increment  those  points  in  an  array.  Figure  5  shows  the  relevant  geometry. 

Figure  5:  Geometry  for  the  Hough  Tmnsform. 

In  this  case  the  reference  point  coordinates  (xc.yc)  are  the  only  parameters  (remember,  rotauon  and 
scaling  have  been  fixed).  Thus  if  we  encounter  in  an  image  an  edge  point  (x,y)  with  gradient 
orientation  (ip)  and  span  (1)  we  know  that  the  possible  reference  points  are  at 

{x-l-r((p,l)cos(a(<p,l)),y-(-r((p.l)sin(a((p,l))) 
and  so  on.  ' 

Thus  we  can  describe  the  generalized  Hough  aigonthm  as  follows: 
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Generalized  Hough  for  Shapes 

Slep  0.  Make  a  uible  for  the  shape  lo  be  located  like  that  shown  m  I-'igure  2. 

Step  1.  V-’orm  an  array  of  possible  reference  points 
yCmin^y^^max)  ‘"'Ualized  to  zero. 

Step  2.  For  each  edge  do  the  following: 

Step  2.1.  Compute  <p(x),l(x) 

Step  2.2a.  Calculate  ilie  possible  centers,  i.e.,  foreach  table  entry  for  (ip.l) 
compute 

xc  ;=  x  +  r(<p,!)cos(a(<f).!)) 
yc  :=  y  +  r((p,I)sin{a((p.i)) 

Slep  2.2b.  Increment  the  array 
II{xc.yc)  :=  n(xc.yc)  +  l 

Step  3.  Possible  locauons  for  the  shape  are  given  by  maxima  in  the  array  H. 

In  terms  of  our  Hough  transform  notation,  the  transform  is  of  the  form 
<(<p{x,y),l(x.y).x,y),(xc.yc),r> 

where  T  is  the  constraint  relation  between  ((p(x,y),l(x,y),x,y)  and  (xc.yc)  shown  by  Figure  5.  Also  the 
inner  loop  of  the  algorithm  (Step  2.2)  computes  given  an  edge  'Ihe  outer  loop  (Step  2) 

computes  'I'he  results  of  using  this  transform  'jo  detect  a  shape  ate  shown  in  Figure  6. 

Figure  6a  shows  an  image  of  shapes.  Ihe  R-table  has  been  made  for  the  middle  shape.  Figure  6b 
shows  tile  Hough  Transform  for  the  shape,  i.e..  H(xc.yc)  displayed  as  an  image.  F'igiire  6c  shows  the 
shape  given  by  tlie  m.axima  of  Hf.xc.yc)  overlaid  on  top  of  tlie  image. 

F'igure  6:  Applying  the  Generalized  1  lough  Technique. 

(a)  Syntheuc  image,  (b)  Hough  ITansform 
A(. xc.yc)  lor  middle  shape. 

What  about  the  parameters  of  scale  and  rotation,  s  and  0"^  These  are  readily  accommodated  by 
expanding  tlie  accumulator  array  and  doing  mote  work  in  the  incrcmentauon  step,  llius,  in  Step  I, 
the  accumulator  array  is  changed  to 

^'(’‘‘miin'^'-max  ■y-min-y'-max'^min-^ax’^min-^max^ 
and  Step  2.2a  is  changed  to 

foreach  table  entry  for  (ip.l)  do 
foreach  s  and  0 
xc  :=  x  +  r((p,l)  s  cos(a((p,I)+<?) 
yc  :=  y  +  r((p,I)  s  sin{a((p,l)+tf) 

Finally,  Slep  2.2b  is  now 

H(xc,yc,s,ty)  :=  H(xc,yc,s,tf)+1  ■*- 

Now  the  transform  is  given  by  <(fp(x,y).l(x,y),x,y).(xc,yc,s,(p), T' >  where  I  '  incorporates  the  rules 
for  computing  s  and  <p.  .Notice  that  this  algoriUini  is  no^onally  parallel  s;nce  all  the  incre.mentauons 
are  independent,  and  that  the  space  required  is  exponenual  in  the  number  of  parameters. 

3.2  I'eatiire  Space  Decompositions 

In  the  example  of  Section  3.1.  a  paruciilar  .shape  is  found  by  a  notionaily  parallel  transform 
from  edge  space  (tfj(x,y),l(x.y),x,y)  to  a  fotir-dimunsicna!  shape  space  (xc.yc, 5,(2).  However,  time  can 
be  traded  for  spa^e  by  .'Inding  f/cuns  of  these  parameters  •icawrr.d'y.  Tlie  advantage  of  Lbc 
se-viiiential  search  is  ih.’t  the  dimensionality  of  tile  Li'niputation  at  each  stage  :s  nrieh  'css  than  '.he 
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single  computauon  involving  a!!  of  the  parameiers  simiil'iineousiy  [liallnrc!  and  Sabbah,  19X1],  I'or 
example,  where  N.^.  N^,  and  S/j  are  die  sizes  of  die  spaces  (xc.yc),  s,  and  II  respectively,  searcliing 
for  a  particular  sliape's  parameters  in  the  order  (\U)  and  (xc.yc)  requires  parameter  space  equal  to 
Nj.N^y  +  N^  instead  of  The  Hough  transform  for  die  individual  group  is  sul!  nouonaily 

parallel,  so  the  ume  needed  in  uhe  sequenual  transform  is  only  proporuonal  to  die  number  of 
parameter  groups.  In  uhe  shape  example,  the  number  of  groups  .s  two. 

To  see  how  scale  and  orientadon  can  be  detected  independently,  consider  a  table  uhat  encodes 
the  orientation  of  the  edge  with  rcsspect  to  die  silhouette's  coordinate  system.  For  e.xample.  for  each 
edge  (rp.l)  encode  the  angle  necessary  to  rotate  the  edge  clockwise  so  diat  it  is  parai'e!  to  the  x-axis. 
U.sing  this  table,  the  algoridim  is  as  follows: 

Hough  Algorithm  for  Orientauon  and  Scale: 

Step  0.  Make  an  orientauon  table  as  a  function  of  qo  and  1. 

Step  1.  Form  an  array  of  possible  scale-orientauon  pairs  I  !(0:2ir,Sf^j,;Sj^gj(). 

Step  2.  For  each  edge  do  the  following; 

Step  2.1  Compute  (p(x),!(x). 

Step  2.2  Foreach  S  do  the  following; 

(a)  look  up  the  table  entry  a(<p(x).s*l(x)). 

(b)  increment  the  array 
H(a,s)  :=  H(a.s)  +  1. 

Step  3.  Possible  orientations  and  scales  ate  given  by  ma.xjm.a  in  the  array  H. 

The  value  of  sequenual  searches  through  parameter  space  becomes  even  more  important  in  3-d 
since  this  case  requires  seven  parameters:  uhree  posiuonal  coordinates:  three  orientation  angles;  and 
a  scale  factor.  The  sequenual  Hough-shape  transform  extends  readily  to  3-d  and  has  been  used  to 
detect  polyhedra  (Hallard  <uid  Sabbah,  1981)  using  uhe  constraints  of  [Kanade,  1978;  1979). 

The  previous  example  is  for  a  single  shape.  For  \  shapes,  given  uhat  the  search  is  in  parallel,  a 
size  factor  of  .N  is  added  to  uhe  search  space.  To  cut  down  on  die  impact  of  this  factor  one  needs  a 
shape  uixonomy  like  that  of  Bribiesca  (Bribiesca  and  Guzman,  1979)  where  all  shapes  can  be 
described  as  a  branch  in  a  single  shape  tree.  The  advantage  of  the  shape  tree  is  that  radier  than 
looking  for  all  N  shapes  in  parallel,  die  search  can  be  paruuoned  into  searches  of  spaces  of  size  N,, 
Ny,  etc.,  where  the  sum  of  these  is  roug.hly  equivalent  to  ;og(N). 


4.  llicrarcliics  of  Alistriiction  Levels 


I'he  value  of  using  several  hierarchical  levels  of  abstracuon  m  vision  is  ihal  'lie  interaction 
between  levels  is  siinplJietl.  This  does  not  mean  tliat  high-level  dscripuons  cannot  nlluenee  li/W- 
level  descriptions,  or  that  uhe  eiilire  computations  are  not  carrieu  oi;'  m  parallel,  l^ajiet.  each 
descriptive  level  can  only  innuetice  nearby  levels.  In  Sabbah  |19Si;,  the  limitation  is  to  levels 
directly  above  and  below.  Oilier  levels  are  inlltiencec  mtlirectly.  The  mplication  ‘‘o’-  the  l!oLa.': 
transforms,  which  specify  the  eonsuaints  between  levels,  is  that  tile  .oiis’..'a:p.:  telationsiiips  betweer; 
levels  involve  only  a  few  parameters.  This  ts  an  especially  imporU'!:'.  feature,  since  'iie  space 
retiuirec!  by  the  Hough  transform  is  exponential  in  the  r.tim.ber  of  parameters,  as  are  the  sets 
Di.Terent  levels  of  abslrticuon  have  been  used  by  [Hanson  and  Risemaa,  1978|.  fixamples  using  die 
Ilougli  iransl'orni  may  be  found  in  [Sabbah,  19.S1;  Render.  197H[.  Samiaii  uses  four  levels  such  as 
'Jiosc  shown  in  l-'igtire  7  to  reorganize  origami  wor.d  figures. 


Figure  7, 

I'o  show  an  example  in  detail.  Render’s  technique  .•'or  deiecung  vanishing  points  in  an  images 
from  oriented  line  seg.ments  [Render.  197.81  is  desc.ubetl.  Such  line  segments  which  are  part  oi  a 
given  vanishing  point  form  a  radial  field  which  emanates  from  the  point.  OuTerent  vanishing  points 
have  different  sets  of  associated  radial  line  segments  (Fig.  8). 

Figure  8. 

This  example  is  iiueresung  since  the  same  situation  occurs  with  respect  to  opucnl  Hov/  due  to  pure 
iranslauon.  If  die  objects  :n  the  image  are  stationary  with  respect  to  a  trar.slaung  observer,  then  the 
lluw  vectors  will  be  emanatng  radially  from  a  "I'ocus-of-expansion"  'l  Oli)  in  the  directon  of 
mouon.  Objects  transiatng  with  respexu  to  the  observer's  I'lanie  will  produce  their  own  llcv/ 
emanaung  I'rom  a  duTcreru  r'OF  (Fig.  8). 

This  example  involves  two  levels  of  absuacuen.  The  fust  uansforms  coiinear  edge  seg.ments 
into  points  (represenung  lines).  Radial  sets  of  edge  elements  corresiiond  to  circles  'hrciigh  the  origin 
in  line-space.  1  luis  the  second  transformation  is  between  circles  in  ane-spnee  to  points  .n  radiai-fe.d 
space. 


The  first  level  is  easy  if  a  ((>,0)  line  space  is  used  wla-.a' 
q  =  X  cos/y  -1-  y  sini9. 

Since  an  edge  element  has  directon  a  (Fig  9).  each  such  element  maps  onto  precisely  one  point  in 
(lij))  space:  (x  cosa  -+-  y  sina.  a).  Thus  die  Hough  transform,  ui  .he  notation  of  Secuon  3,  is: 


<(x,y.a(x,y)),  {p.O),  {0  =  a:  p  =  \  cosa  y  sina)>. 

Figure  9. 


Now  maxima  in  Clp.ff)  correspond  to  lines  in  the  image.  .Mso,  radial  lines  will  form  a  eii’cle  of 
local  maxim.a  in  (;),fy)-space.  I'o  .see  tins  note  that  the  triangle  OfQ  m  Figure  I.b  is  always  a  rigi-.t 
triangle,  and  therefore  00  rnust  be  the  diameter  of  a  circle.  .Note  tl'.at  th.s  circle  is  constrained  ’o 
go  througli  the  origin  so  that  its  diameter  must  be  on  die  iir.e 

p/2  =  a  coiO  b  sintf 

where  (2a, 2b)  is  die  locauon  of  the  focus  of  expansion  (or  vanishing  point).  Thus  die  second 
transform  is 


(a.b),  (/)/2  = 


a  coil)  +  b  sm(y)>. 


liiiI)lunicnta(ion  in  l’ara(tieti.‘r  Networks 


In  our  earlier  definition  of  the  Hough  transform,  we  assumed  that  the  measurements  a]^  all  had 
confidence  equal  to  unity.  With  higher  level  of  abstraction  Hough  uansfnrms.  this  may  no  longer  be 
die  case.  This  is  easily  iiandled  by  keeping  uack  of  the  confidences  in  die  set  lij^.  i.e.. 

\  =  {(b.C)|lIa,^.b)  <  and  C  =  Cfa^)}. 

Then  H(b)  is  the  sum  of  the  confidences  associated  with  the  value  b  in 


5.  Kociis-of-Attention 

Previously,  intrinsic  image  to  feature  space  transforms  used  single  Hough  transforms.  We  are 
now  ready  to  tackle  issues  which  arise  wnen  several  Hough  transforms  are  used,  lorst  we  show  diat 
multiple  I  lough  transforms  can  be  invoked  in  parallel  to  resolve  die  p.^blem  of  uetccung  a  unit 
with  multiple  I'eatures.  Hus  is  done  via  the  mt'clianism  of  a  coninxt  Hough  transform  which  is  the 
sum  of  iiiihvidual  Hough  transforms.  .Next,  we  descfbe  a  Ibcusmg  mech.anism  winch  exploits  the 
fact  that  an  ambiguity  in  one  space  may  be  resolved  in  another.  I  Ins  technitiue  allows  the  aetecUon 
of  arbitrarily  fine  detail.  Attenuon  can  be  directed  .'rom  a  unit  to  us  stibparts  and  back  again  via  a 
mechanism  termed  xtqunncinv. 

5.1  Spatial  Context 

If  a  unit  has  multiple  spatially  registered  features,  these  can  be  detected  by  applying  two 
different  sets  of  Hough  Transforms.  The  Hough  '!':ans.''orm  defined  in  Secuon  3  is  boltom-up: 
points  in  the  intrinsic  image  space  determine  plausible  sets  of  points  in  feature  space.  Tlie 
complementary  trans.^'orm  is  top-down;  points  in  feature  space  determine  plausible  sets  of  points  in 
intrinsic  image  space.  I'ormally,  given  a  set  £  li.  we  compute 

Ar  =  (a  I  1)]^  and  lla.hj^)  < 

H(x)  is  die  number  of  times  the  value  a(x)  occurs  ;.a  T.he  mapping  which  defines  H(a)  is 

likely  to  be  one  to  many  and  .fiiriliermore.  for  a  given  feature,  different  bj^s  should  give  rise  to 
disjoint  subsets  of  A.  Owing  to  tliis  last  point,  it  is  intuitively  appealing  to  deal  with  H,|(x)  which  :s 
simply  die  sum  of  the  confidences  of  different  values  of  the  parameters  ai,a2,...  which  are  at  the 
same  spaual  location  x,y,  i.e., 

Hjj(x)  =  Xa  H(a.x) 

An  Example 

Consider  again  the  image  of  a  red  spot  on  a  green  background,  where  the  spot  takes  up  one- 
third  of  the  image  pixels.  Then  tlie  transform  I  Kb)  where  h  =  r.g.b  has  two  peaks  and  is  zero 
everywhere  else,  i.e.,  for  four-bit  color  scale  accuracy 

H(b)  =  1  if  I)  =  (0,15,0) 

1/2  if  b  =  (15,0,0) 

0  odicrwise 

Now  consider  b^  =  (15,0,0)  and  compute  H(a.x).  This  is  given  by 

H(a.x)  =  1  if  X  in  spot  and  a  =  RED 

0  odierwise 

A  point  in  A  represents  the  single  color  red  and  so  H(x)  in  t.his  case  is 

H(x)  =  1  if  X  is  spot 

0  otherwise 


Ihe  transform  lt(x)  is  called  the  spatial  context  'jansform  for  reasons  that  will  become  more 
apparent  when  we  discuss  focus  of  attenuon.  The  efft-c  of  this  transibrm  is  to  place  an  imaginary 
tiller  in  front  of  the  sensors.  In  the  above  case,  only  sensors  that  are  spaually  regisierecl  with  Ifl-.T) 
sensors  would  receive  input. 

Multiple  Features 

Now  consider  the  case  where  multiple  features  are  present  in  the  image.  Rach  individual 
bottom-up  transforms  for  different  features  can  be  applied  in  parallel  to  com.pute  IIillul, 

H2(b2) . feature  spaces).  Now  maxima  :n  each  of  uhese  spaces  can  be  used  to 

compute  individual  top-down  transforms  Mj,i(x).l!,jyx) . I!,y^(x).  I’hc  generalized  spatial-context 

transform  n(x)  is  simply  'die  normalized  sum  of  dhese  individual  '.ransforms,  :.e.. 

IKx)  =  (1/m)  [I^^(x) 

Now  the  value  of  II(x)  at  a  point  X  is  the  fraction  of  the  maximum  number  of  spaually 
registered  features  that  are  present.  High  values  of  1!  corri-spond  to  spaually  regisiered  intrinsic 
Ullage  points  wluch  each  have  been  grouped  into  a  unit  by  a  separaie  boi'uim-up  transform.  Thus 
ll(x)  rcpresenis  a  possible  soluuon  to  uhe  mul'jpie-featurc  problem. 

5.2  Sulispaces  and  .Sequencin)> 

In  our  fomialism,  a  segment  in  an  image  is  ideally  represented  as  a  conjtincuon  of  Hough 
transform  maxima.  Ilach  set  of  .maxima  corresponds  to  an  organization  with  respect  to  a  givc.n 
modality:  color,  velocity,  etc.  In  'iie  previous  sccuon  we  showed  how  tiic  parallel  genc.mtion  of  'iiese 
rnaxima  could  be  used  '.o  discover  regions  in  the  image  corrcspor.d.ng  to  mulu-.modal  units. 
Uiilbriunately,  this  '.echnique  will  usually  be  inadequate  because  th':  unit  is  not  mam.^'ested  as  a 
clear  maxima  in  all  the  modalities.  As  an  e.xample,  consider  a  light-blue,  moving  unit,  against  a 
background  of  other  units,  none  of  which  arc  Lght-bluc,  but  which  are  moving.  In  tire  color  space, 
the  unit  IS  clearly  revealed;  light-blue  units  have  high  confidcnc.'  values  (!':g  10). 

Figure  10. 

In  velocity  space,  however,  there  is  no  clear  maximum  owing  to  the  presence  of  other  .moving  units. 

The  fundamental  problem  is  that  each  modality  consists  of  a  projection  of  feature  space.  In  'he 
high-dimen.sional  space  consisting  of  the  concatena'uon  of  all  tlie  individual  dimensions  of  each 
modality,  each  unit  v/otild  appear  as  a  distinct  .maximum,  fhe  visual  system  model  is  siruciureu  to 
examine  only  the  subspaccs  of  the  individual  modalities.  The  principal  reason  fur  'his  is  economy; 
tire  space  requirement  increases  exponentially  wuh  'Tie  number  of  modaliues. 

This  problem  can  be  surmounted  if  the  different  parameter  spaces  arc  examined  scquenually. 
First  the  parameter  spaces  are  examined  Ibr  maxima.  The  most  disunct  maxima  is  picked  and  us 
inverse  Hough  transform,  C(x),  is  generated.  This  transform  can  be  usd  to  block  input  from  sen.sors 
positioned  at  its  low  confidence  values.  To  see  how  this  might  work,  let  us  reconsider  the  previous 
example  of  the  light-blue,  moving  unit.  In  color  space  there  is  a  cleat  maximum  corresponding  '.o 
liglu-blue.  Tins  value  is  used  '.o  generate  Ciight-biue^^'  '>tmsors  diat'are 

not  spaually  registered  wiUi  light-blue  color  input.  The  net  effect  is  th.nt  in  velocity  space  'here  is 
now  a  clear  maxima  as  input  from  other  units  has  been  blocked. 

5.3  Multiple,  SpatiallyFegistcred  Features 

Sequencing  solves  the  problem  of  building  up  coherent  groups  of  features,  but  has  la 
drawbacks.  For  example,  if  die  "blue,"  "moving."  "horizontal"  object  were  a  "frisbee,"  one  would 
like  dus  percept  to  be  triggered  via  a  Hough-like  transibrm.  However,  in  the  sequencing  example, 
there  is  iniual  evidence  for  all  light-blue  objects,  and  this  is  a  very  large  set.  Worse,  the  percept 
"Irisbee"  could  be  triggered  by  non-spa'aaily  regis'.ered  groups  of  "TTuc"  and  "moving"  mpui-s. 
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Thure  is  a  soluiion  to  lliusc  problems  if  we  assume  '.hat,  in  general,  aettial  occurrences  of  features 
will  be  sparse.  In  other  words,  in  a  gi’ven  image  '.here  should  not  be  two  very  similar  colors 
associated  widi  different  objects.  If  there  are,  our  Hough  transform  model  will  only  be  able  to 
concentrate  on  one  of  them  at  a  'ume. 

The  solution  is  due  to  [f-eldman  and  Ilallatd,  19S1|.  We  will  develop  it  here  in  terms  of  our 
Hough  transform  formalism  in  stages,  i'irst  we  formally  concatenate  parameter  spaces.  .N'exi  we 
describe  lowresoluiion  concatenated  spaces.  I-inally,  we  consider  low-resolution  parameter  spaces 
which  can  be  luned  to  specific  parameter  values. 

Ideally,  one  could  resolve  the  spatial  regisirauon  problem  by  concatenaung  feature  spaces.  For 
example,  concatenating  color  space  with  mouon  space  leads  to 

where  elements  in  the  expanded  space  (bj.,bf^)  €  are  only  included  if  the  input  features  are 

spatially  registered,  i.e.,  while  this  is  simply  described  :n  symbolic  .''orm,  it  is  also  impractical  since 
the  parameter  spaces  for  tlie  combined-modality  cleme.nts  are  impracucally  large,  A  partial  soltiuon 
to  the  si/e  problem  is  to  decrease  the  number  of  parameter  nodes.  Let  b^.'  b^'  be  values  for  color 
and  mouon  parameters  respecuvely  in  the  low-resolution  spaces.  Then  the  low  resoluuon  Hough 
transform  is  given  by 

Uk  =  {{bc'.bm-)|3kc(’‘k)'  akm(^k)-  ^c^^c^'XA'  f^^lombm ’ 'S^m)  (5.1) 

where  the  bounds  A^,  and  A^  are  larger  to  account  .'or  'he  !ower-reso!ut:on  in  parameter  space,  'fhe 
grain  of  the  low-resolution  space  can  always  be  chosen  '.o  make  the  trans,''orm  practical  in  terms  of 
.space.  However,  now  groups  of  parameters  that  are  sufficiently  similar  may  be  uansformed  into  die 
same  parameter  node  via  Eq.  (5.1).  To  re.so!ve  this  problem  we  use  a  two-uered  trans.''orrn, 
consisung  of  high-resoltiuon  .single-modali'.y  transforms  and  low- resolution  mtilu •modality 
transforms.  Using  tlie  single-modality  uansforms,  we  select  maxima  ib^.'}  and  (bj^*)  such  '.hat 

b(,*  =  max,3c(bc  ^  b^,'  ±  .5AJ 
and 

bm*  =  ™^bm(bm  ^  b^,'  ±  .5A^}. 

These  values  are  then  used  to  tune  the  low-resol'u'.ion  Hough  transform,  i.e., 

*'k  =  {(bc'.bm')lakc'  ^km-  <'c(ak.bc*A  5m(‘''k'bm*)<^c’- 

Thus  the  low  resolution  transform  can  be  tuned  '.o  count  only  a  stibset  of  the  high  resoluuon 
parameter  nodes.  The  drawback  of  tius  technique  is  'hat  it  can  only  respond  to  a  single  value  of 
(bj.,bj^)  in  each  range  {b^  ±  .5A^,  bj^  ±  hAj^}.  Thus  ciuher  the  high  confidence  parameter  nodes 
must  be  sufficiently  sparse,  or  only  one  of  the  conltision  classes  can  be  examined  at  any  one  ume. 
I'his  disadvantage  is  outweighed  by  being  able  to  delect  spaually-registered  features  and  thus 
circumvent  the  more  severe  problem  disctis.sed  earlier. 

6.  riglu  Coupling 

Most  of  the  previous  examples  imply  that  the  vanous  Hough  transforms  are  relauvely 

independent.  That  is,  once  the  intrinsic  images  are  computed,  the  transforms  can  lie  computed.  Ihe 
general  case  is  that  this  is  not  true;  the  intrinsic  im.age  contains  global  parameters  which  must  be 
computed  using  Hough  transforms.  Since  the  Hough  ■..-ansform  required  an  intrinsic  image  it  might 
seem  that  neitlier  could  be  computed.  In  fact,  both  the  Hough  transform  and  die  intrinsic  images 
can  be  computed  by  incorporaung  the  Hough  transforms  into  the  parnllel-iterntive  sdieme  used  to 
compute  the  intrinsic  images.  If  die  combined  problem  is  well-condiuoned;  1)  the  partial  result  for 
die  intrinsic  image  will  be  sufficient  to  produce  a  partial  result  for  'he  Hough  transform,  and  vice 

versa;  and  2)  this  proeess  of  using  parual  results  in  a  parallel-iterative  manner  will  converge.  We 

term  this  interdependence  tight  coupling  and  illustrate  it  with  two  examples. 
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In  the  first  example,  we  show  how  a  surface  oricntauon  intrinsic  image  can  be  computed  from 
intensity  information.  Tins  example  seems  paradoxica,  at  first  since  to  compute  surface  oneniauon 
one  must  know  tlte  locauon  of  the  source  of  liiuminauon  and  vice  versa.  We  show  how  botli  tiiese 
compuuiuons  can  be  conducted  simultaneously  with  the  parual  result  for  the  surlace  orienuiuori 
helping  tile  illumination  angle  determinauon.  and  the  parual  result  for  the  illuminauon  angle- 
helping  the  surface  oricntauon  detetTmnpuon.  The  illumination  angle  :s  determined  by  a  llougri 
transform. 

In  the  second  example,  we  show  that  a  three-dimensional  flow  field  can  be  segmented  into 
groups  of  vectors  that  represent  general  rigid  body  mouon.  The  problem  here  is  that  an  individual 
field  vector  v(x)  is  an  unknown  sum  of  rotational  a-ncl  translational  components,  i.e.,  v(x)  =  V[^(x) 
-b  v-j-fx).  These  components  can  only  be  determined  by  knowing  global  rigid  body  mouon 
parameters.  However,  these  parameters  can  be  deternmned  only  if  v(x)  ;s  paruuoned  into  Vf^(x)  and 
v-[-(x).  As  in  the  earlier  example,  this  problem  can  be  resolved  by  a  parallel-iterative  scheme  which 
computes  both  die  global  parameters  and  the  velocity-field  decomposiuon  simultaneously. 

llalher  than  being  isolated  e.xamples,  bght  coupling  is  believed  to  be  the  general  case. 
Extending  die  scope  of  die  paralleTiierative  compuiauon  is  the  general  soluuon. 

6.1  Shape  frurii  Shudiiig  liy  lielaxution 

Given  the  orientation  of  a  surface  with  respect  to  a  viewer,  its  refiectance  properties  and  die 
location  of  a  single  light  source,  that  die  brightness  at  a  point  ol'  tl'.e  viewer's  retina  can  c'e 
determined.  That  is,  the  refiectance  .fiincuon  where  (l,ip  and  Oyip^  are  orientauons  ■)! 

die  surface  and  .source  .’•espectively,  allows  us  to  determine  I(x,y),  the  ..".tensity  i."  terms  of  reunal 
coordinates  |Morn  and  Sjoberg,  197SI.  I'he  form  of  R  is  assumed  to  -x-  known.  However,  die 
perceptual  problem  is  die  reverse:  given  I(x.y)  and  R(.,.),  determine  f/(x,y),((i(x,y)  and  Oy<f>y 

In  general,  die  proble.m  of  deriving  //(x.y)<p(x.y)  and  ;s  underdeiermined.  However, 

Ikeuchi  [19801  iihowed  -iint  the  surface  could  be  determined  locaily  cnee  was  specified.  I'his 
mediod  has  been  extended  |Hal!ard.  1981bj  to  the  case  where  .s  miually  unknown. 

The  algorithm  is  outlined  as  follows.  I-'or  a  single  light  source,  tite  intensity  at  a  point  on  a 
reuna  can  be  described  in  terms  of  'die  orientadon  of  t.ne  normal  of  l.be  vOrresponding  surface  point 
and  die  surface  orienuiuon.  I'hal  is,  in  spherical  notauon, 

l(x,y)  =  KHT'p.fIs.'Ps) 

where  the  angles  /J  and  (p  are  functions  of  x  and  y.  N'ow  by  minimi/ing  (F-R)^  and  appending  a 
smoodiness  constraint  on  0  and  tp  -we  have  [Ikeuchi,  1980)  an  expression  for  the  local  error  (if  the 
estimate  for  0  and  ip  is  unreliable)  as  follows; 

E(x.y)  =  (I-R)^  +  \((v2fy)2+(v2,p)2) 

where  \  is  a  Lagrange  multiplier.  Tor  a  minimum,  E/j  and  E.^  =  0.  Skipping  some  steps,  this  leads 
to 

'pU.y)  = 

^(x.y)  =  fyay(;(x,y)-f-T(x,y)R^ 
where  <P;,yg(x,y)  is  a  local  average  and 
T(x.y)  =  (1/161)(1-R) 

In  solving  these  eiitiauons.  we  assume  and  q>^  are  known.  An  iterauve  method  is  used  where  die 
ipgyg  and  fijjyg  are  calculated  from  a  previous  iteration. 
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To  calculate  and  (p^,  we  assume  0  and  tf  are  kjiown  and  use  a  Hough  technitiue.  f  irst  we 
form  an  array  Hlf^s.iPsI  of  possible  values  of  0^  and  q>^  iniualized  to  zero.  Now  we  can  solve  die 
reflectance  ecitiuiion  lor  ip^  The  Hough  technique  works  as  follows.  For  each  surface  element  0,ip, 
and  for  each  0^  we  calculate  fpj.  and  increment  Hlffj.ipjl,  ue.,  H|f/j.(p^|  ;=  Il|tfj.,(pj,l+ 1.  After  all 
surface  elements  have  been  processed,  the  maximum  value  of  C  corresponds  to  the  locauon  of  die 
point  source.  In  (Hallard,  198Ib|  it  is  shown  that  calculauon  of  the  source  locauon  can  proceed  in 
parallel  with  that  of  0(x,y)  and  (p{x.y)  and  that  the  two  calculations  will  converge. 

Results  for  the  one-dimensional  cas?  are  shown  in  Figure  11  for  the  case  of  a  small  surface 
"bubble."  Figure  11  shows  the  surface  convergence,  as  well  as  die  convergence  of  tiie  dluminauori 
angle. 


Figure  11:  (a)  Shading  (top  left  curve). 

(b)  Surface  convergence  (colored  points  immediately  below  (a)). 

(c)  lUuminauon  angle  Hough  transform  (bottom  ielt). 

(d)  Illtiminauon  angle  convergence  (upper  right). 

It  is  important  to  remember  that  the  boundary  condiuons  in  this  problem  have  been  provided  a 
priori',  in  this  case  they  are  the  orientation  of  the  surface  at  the  boundary  of  the  bubc.e.  Generally, 
these  will  have  to  be  determined  by  multiple  intrinsic  images  relaxations,  as  menuoned  ;n  Section  2. 

6.2  3-D  Rigid  Rody  Motion 

The  general  motion  of  a  rigid  body  can  be  described  by  eight  parameter.s:  three  for 
translational  velocity  V'j-;  three  for  angular  velocity  Q;  and  two  lor  fie  locauon  of  the  axis  of 
rotauon  r.  We  describe  the  detection  of  rigid  body  motion  in  three  parts,  each  of  which  uses  Hough 
transforms.  First,  sve  shosv  how  to  detec'  pure  translnuon  (vj).  Next  we  show  how  to  detect  pure 
rotation  (Q.r).  Finally,  we  show  that  a  .Vd  flow  vector  can  be  iteratively  decomposed  into  a 
translational  component  and  a  rotational  component.  These  components  are  described  by  the 
parameters  (vp.  12.  r). 

Ihire  I’ranslational  .Motion 

This  case  is  very  simple.  If  a  rigid  body  is  translating  with  velocity  v-i-.  tlien  a  point  on  the 
body  at  location  x  wiU  have  velocity  v(x)  =  vj.  To  detect  this  take  the  Hough  transform  given  by 
<(x.  v(x)).  (v-p).  (v(x)-v.p  =  0)>.  The  maximum  value  in  H(v.p)  will  correspond  to  the  translauonn! 
velocity. 

Pure  Rotational  Motion 

In  the  case  of  pure  rigid-body  mouon.  each  point  on  an  axrs  in  space  such  that 

v(x)  =  S2x;j(x).  (61) 

where  v.  12,  and  r  are  all  orthogonal  and  p(x)  is  a  vector  from  the  point  x  to  the  axis  of  rotation 
such  that 

p'(12xv)  =  0. 

That  is,  p  is  defined  so  as  to  be  perpendicular  to  12  and  v. 

One  problem  is  to  specify  the  axis  of  rotation.  This  is  done  using  a  vector  r  which  is  the 

smallest  vector  from  the  origin  to  die  rotation  axis  (see  Figure  ’.2). 


Figure  12. 


The  pure  roiauon  case  involves  five  parameters:  three  for  the  vector  12  anti  two  to  specify  the 
axis  of  rotauon.  A  stantlarcl  Hough  technique  would  involve  a  iransformauon  from  (x.  v(x))  to 
(i2.r')  using  Mq.  (6.1).  Only  a  vector  r*  equal  to  any  two  components  of  r  is  necessary  since  12  r  = 
0.  However,  a  five-dimensional  space  is  large,  thus  we  are  motivated  to  decompose  the  parameter 
space  (12.r)  into  two,  smaller  spaces  jJlallard  anti  Sabbah,  19811.  One  space  is  composed  of  two 
components  and  <jy  of  a  unit  vector  w  which  defines  the  direcuon  of  12.  The  other  is  composed 
of  the  magniiutle  of  12  and  two  components  of  r. 

Since  u  must  be  perpendicular  to  v, 
w  ■  V  =  0. 

Furthermore.  |u|  =  1.  Combining  these  two  equauons  leads  to 
“x''x  ‘^y''y  ‘ 

which  IS  a  quadrauc  equauon  in  unknowns  and  Wy.  Thus  the  direction  of  the  rotauon  vector 
may  be  found  from  the  Hough  transform 

<(v(x)).  (w^.uiy).  (Fq.  (6.2))>. 

Once  u  is  known,  it  can  be  used  in  the  following  series  of  equauons.  If  |12|  is  the  magnitude  of 
the  rotation  vector,  die  vector  s  given  by 

s  =  X  -  wxv/|121 

IS  on  the  rotauon  axis.  Furthermore,  r  is  given  by 
r  =  s  -  (s  •  w)6j 

so  that 

r  =  (x  -  wxv/1121)  -  (x  ■  u)u.  (6.3) 

This  equauon  can  be  used  to  determine  the  first  two  components  of  r  given  a  value  for  |12|.  Thus 
we  can  deteroune  112!  and  v  from  the  following  Hough  transform: 

<ix,  v(x).w)  (r^,  ty.  1121)  (Eq.  (6.3))>. 

(lenerul  Rigid  Mody  Motion 

Finally,  suppose  the  motion  is  completely  general  so  that 
v(x)  =  v.|(x)  4- 

Since  only  v(x)  can  be  mea.sured,  how  can  one  determine  how  muc.'i  is  translational  velocity  and 
how  much  is  rouiuonal  velocity''  One  possibility  is  to  dynamically  paruuon  the  velocity  into  two 
components  v-j^x)  and  Vj2(x)  which  give  Lhe  most  consistent  global  parameters  (v-j',  12  and  r).  This 
would  work  as  follows: 

Step  0.  Assume  vq(x)  =  v(x). 

Step  1.  Use  the  Hough  transforms  to  esumate  (12,  r). 

Step  2.  Use  (12,  r)  to  determine  vj2(x). 

Step  3.  Compute  V'j-(x)  =  v(x)  -  vq(x). 

Step  4.  Use  the  Hough  uansform  to  esumate  vj. 

Step  5.  Compute  vq(x)  =  v(x)  -  v-j-(x). 

Step  6.  If  vq(x)  has  not  converged,  go  to  Step  1. 
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7.  Discussion 

I'he  key  ideas  of  Uiis  paper  are  summarized  in  the  introducuon.  Here  we  nieiuion  other  ideas 
which  do  not  111  casiiy  under  any  one  of  the  previous  headings. 

1)  I'he  Intrinsic  Image/Feature  Space  Duality.  Hy  distinguishing  between  image  fields  and 
imtige  features  we  know  when  relaxation  is  the  more  important  tool  and  when  the  Hough  transform 
IS  more  important. 

.7)  Umt/value.  Hy  reducing  die  underlying  primitives  to  units  of  extreme  simplicity  we  can 
algorithmically  determine  the  connection  patterns  to  represent  m-ary  rclauons. 

3)  Massive  Parallelism.  Hy  assuming  the  availability  of  massive  parallel  compiitauon,  we  reduce 
the  need  for  set|iienual  proces.sing  to  more  essential  cases.  I'or  example,  we  use  sequenual 
processing  in  Secuen  5  to  resolve  real  ambiguiucs  in  the  input. 

4)  Extensibility.  The  representation  is  very  general,  being  m-ary  consistency  relations,  and  can 
be  extended  to  odier  domains  besides-:  vision,  to  arbitrary  levels  of  abstracuon  within  vision. 
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